Search CORE

116 research outputs found

Federated Survival Forests

Author: Archetti Alberto
Matteucci Matteo
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Survival analysis is a subfield of statistics concerned with modeling the occurrence time of a particular event of interest for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, real-world applications involve survival datasets that are distributed, incomplete, censored, and confidential. In this context, federated learning can tremendously improve the performance of survival analysis applications. Federated learning provides a set of privacy-preserving techniques to jointly train machine learning models on multiple datasets without compromising user privacy, leading to a better generalization performance. However, despite the widespread development of federated learning in recent AI research, few studies focus on federated survival analysis. In this work, we present a novel federated algorithm for survival analysis based on one of the most successful survival models, the random survival forest. We call the proposed method Federated Survival Forest (FedSurF). With a single communication round, FedSurF obtains a discriminative power comparable to deep-learning-based federated models trained over hundreds of federated iterations. Moreover, FedSurF retains all the advantages of random forests, namely low computational cost and natural handling of missing values and incomplete datasets. These advantages are especially desirable in real-world federated environments with multiple small datasets stored on devices with low computational capabilities. Numerical experiments compare FedSurF with state-of-the-art survival models in federated networks, showing how FedSurF outperforms deep-learning-based federated algorithms in realistic environments with non-identically distributed data

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics

Author: Archetti Alberto
Ieva Francesca
Matteucci Matteo
Publication venue
Publication date: 01/01/2023
Field of study

Survival analysis is a fundamental tool in medicine, modeling the time until an event of interest occurs in a population. However, in real-world applications, survival data are often incomplete, censored, distributed, and confidential, especially in healthcare settings where privacy is critical. The scarcity of data can severely limit the scalability of survival models to distributed applications that rely on large data pools. Federated learning is a promising technique that enables machine learning models to be trained on multiple datasets without compromising user privacy, making it particularly well-suited for addressing the challenges of survival data and large-scale survival applications. Despite significant developments in federated learning for classification and regression, many directions remain unexplored in the context of survival analysis. In this work, we propose an extension of the Federated Survival Forest algorithm, called FedSurF++. This federated ensemble method constructs random survival forests in heterogeneous federations. Specifically, we investigate several new tree sampling methods from client forests and compare the results with state-of-the-art survival models based on neural networks. The key advantage of FedSurF++ is its ability to achieve comparable performance to existing methods while requiring only a single communication round to complete. The extensive empirical investigation results in a significant improvement from the algorithmic and privacy preservation perspectives, making the original FedSurF algorithm more efficient, robust, and private. We also present results on two real-world datasets demonstrating the success of FedSurF++ in real-world healthcare studies. Our results underscore the potential of FedSurF++ to improve the scalability and effectiveness of survival analysis in distributed settings while preserving user privacy

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Wave and Hydrodynamics Modelling in Coastal Areas with TELEMAC and MIKE21

Author: Archetti Renata
Lamberti Alberto
Samaras Achilleas G.
Vacchi Matteo
Publication venue
Publication date: 01/01/2013
Field of study

Hydrodynamic

Hydraulic Engineering Repository

Heterogeneous Datasets for Federated Survival Analysis Simulation

Author: Alberto Archetti
André Martin
Eugenio Lomurno
Francesco Lattari
Matteo Matteucci
Publication venue
Publication date: 01/01/2023
Field of study

Heterogeneous Datasets for Federated Survival Analysis Simulation This repo contains three algorithms for constructing realistic federated datasets for survival analysis. Each algorithm starts from an existing non-federated dataset and assigns each sample to a specific client in the federation. The algorithms are: uniform_split: assigns each sample to a random client with uniform probability; quantity_skewed_split: assigns each sample to a random client according to the Dirichlet distribution [3, 4]; label_skewed_split: assigns each sample to a time bin, then assigns a set of samples from each bin to the clients according to the Dirichlet distribution [3, 4]. For more information, please take a look at our paper at https://arxiv.org/abs/2301.12166 [1]. Content federated_survival_datasets.zip: the content of the repository at https://github.com/archettialberto/federated_survival_datasets Heterogheneous_Datasets_for_Federated_Survival_Analysis_Simulation.pdf: the conference paper describing the work. Installation Federated Survival Datasets is built on top of numpy and scikit-learn. To install those libraries you can run pip install -r requirements.txt. To import survival datasets into your project, we strongly recommend SurvSet (https://github.com/ErikinBC/SurvSet) [2], a comprehensive collection of more than 70 survival datasets. Usage import numpy as np import pandas as pd from federated_survival_datasets import label_skewed_split # import a survival dataset and extract the input array X and the output array y df = pd.read_csv("metabric.csv") X = df[[f"x{i}" for i in range(9)]].to_numpy() y = np.array([(e, t) for e, t in zip(df["event"], df["time"])], dtype=[("event", bool), ("time", float)]) # run the splitting algorithm client_data = label_skewed_split(num_clients=8, X=X, y=y) # check the number of samples assigned to each client for i, (X_c, y_c) in enumerate(client_data): print(f"Client {i} - X: {X_c.shape}, y: {y_c.shape}") We provide an example notebook in the zipped folder to illustrate the proposed algorithms. It requires scikit-survival, seaborn, and pandas. References [1] Archetti, A., Lomurno, E., Lattari, F., Martin, A., & Matteucci, M. (2023). Heterogeneous Datasets for Federated Survival Analysis Simulation. arXiv preprint arXiv:2301.12166. [2] Drysdale, E. (2022). SurvSet: An open-source time-to-event dataset repository. arXiv preprint arXiv:2203.03094. [3] Hsu, T. M. H., Qi, H., & Brown, M. (2019). Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335. [4] Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE

Archivio istituzionale della ricerca - Politecnico di Milano

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The Bi-objective Long-haul Transportation Problem on a Road Network

Author: Archetti Claudia
Jabali Ola
Mor Andrea
Simonetto Alberto
Speranza M. Grazia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

In this paper we study a long-haul truck scheduling problem where a path has to be determined for a vehicle traveling from a specified origin to a specified destination. We consider refueling decisions along the path, while accounting for heterogeneous fuel prices in a road network. Furthermore, the path has to comply with Hours of Service (HoS) regulations. Therefore, a path is defined by the actual road trajectory traveled by the vehicle, as well as the locations where the vehicle stops due to refueling, compliance with HoS regulations, or a combination of the two. This setting is cast in a bi-objective optimization problem, considering the minimization of fuel cost and the minimization of path duration. An algorithm is proposed to solve the problem on a road network. The algorithm builds a set of non-dominated paths with respect to the two objectives. Given the enormous theoretical size of the road network, the algorithm follows an interactive path construction mechanism. Specifically, the algorithm dynamically interacts with a geographic information system to identify the relevant potential paths and stop locations. Computational tests are made on real-sized instances where the distance covered ranges from 500 to 1500 km. The algorithm is compared with solutions obtained from a policy mimicking the current practice of a logistics company. The results show that the non-dominated solutions produced by the algorithm significantly dominate the ones generated by the current practice, in terms of fuel costs, while achieving similar path durations. The average number of non-dominated paths is 2.7, which allows decision makers to ultimately visually inspect the proposed alternatives

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Brescia

SGDE: Secure Generative Data Exchange for Cross-Silo Federated Learning

Author: Alberto Archetti
Eugenio Lomurno
Leonardo Di pErna
Lorenzo Cazzella
Matteo Matteucci
Stefano Samele
Publication venue: ACM
Publication date: 01/01/2022
Field of study

Privacy regulation laws, such as GDPR, impose transparency and security as design pillars for data processing algorithms. In this context, federated learning is one of the most influential frameworks for privacy-preserving distributed machine learning, achieving astounding results in many natural language processing and computer vision tasks. Several federated learning frameworks employ differential privacy to prevent private data leakage to unauthorized parties and malicious attackers. Many studies, however, highlight the vulnerabilities of standard federated learning to poisoning and inference, thus raising concerns about potential risks for sensitive data. To address this issue, we present SGDE, a generative data exchange protocol that improves user security and machine learning performance in a cross-silo federation. The core of SGDE is to share data generators with strong differential privacy guarantees trained on private data instead of communicating explicit gradient information. These generators synthesize an arbitrarily large amount of data that retain the distinctive features of private samples but differ substantially. In this work, SGDE is tested in a cross-silo federated network on images and tabular datasets, exploiting beta-variational autoencoders as data generators. From the results, the inclusion of SGDE turns out to improve task accuracy and fairness, as well as resilience to the most influential attacks on federated learning

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Towards cross-cohort estimation of cognitive decline in neurodegenerative diseases

Author: Archetti Damiano
Durrleman Stanley
Koval Igor
Maheux Etienne
Redolfi Alberto
Publication venue: HAL CCSD
Publication date: 24/08/2020
Field of study

International audienceHeterogeneity of cohorts, in terms of inclusion criteria, design of follow-up visits and batteries of cognitive assessments, hinders any thorough comparisons between them. For that reason, we build a cross-cohort model of cognitive decline that can be personalized to any patient, allowing to impute partially or totally missing scores. This enables to compare at an individual level disease progression of subjects from different cohorts, with a temporal realignment and regarding a broader set of biomarkers

INRIA a CCSD electronic archive server

Differences Between Plasma and Cerebrospinal Fluid p-tau181 and p-tau231 in Early Alzheimer's Disease

Author: Archetti Silvana
Ashton Nicholas J
Battaglio Beatrice
Benussi Alberto
Bonzi Giulio
Caratozzolo Salvatore
Cosseddu Maura
Ferrari Elisabetta
Giliani Silvia
Mensi Lorenza
Padovani Alessandro
Parigi Marta
Pilotto Andrea
Turrone Rosanna
Zetterberg Henrik
Publication venue: 'IOS Press'
Publication date: 05/04/2022
Field of study

Plasma phosphorylated tau species have been recently proposed as peripheral markers of Alzheimer's disease (AD) pathology. In this cross-sectional study including 91 subjects, plasma and cerebrospinal fluid (CSF) p-tau181 and p-tau231 levels were elevated in the early symptomatic stages of AD. Plasma p-tau231 and p-tau181 were strongly related to CSF phosphorylated tau, total tau and amyloid and exhibited a high accuracy-close to CSF p-tau231 and p-tau181-to identify AD already in the early stage of the disease. The findings might support the use as diagnostic and prognostic peripheral AD biomarkers in both research and clinical settings

UCL Discovery

Rare mutations in SQSTM1 modify susceptibility to frontotemporal lobar degeneration

Author: Alexopoulos Panagiotis
Almeida Maria Rosário
Archetti Silvana
Bagnoli Silvia
Benussi Luisa
Binetti Giuliano
Boada Mercè
Bonvicini Christian
Borroni Barbara
Chiang Huei-Hsin
Clarimón Jordi
Cras Patrick
Cruts Marc
De Deyn Peter Paul
De Jonghe Peter
de Mendonça Alexandre
Dermaut Bart
Deschamps William
Diehl-Schmid Janine
Dillen Lubina
do Couto Frederico Simões
Dols-Icardo Oriol
Engelborghs Sebastiaan
Fabrizi Gian Maria
Frisoni Giovanni B
Gelpi Ellen
Ghidoni Roberta
Graff Caroline
Haack Tobias B
Heneka Michael T
Hernández Isabel
Jessen Frank
Jordanova Albena
Kovacs Gabor G
Laureys Annelies
Llado Albert
Lleó Alberto
Maetzler Walter
Martin Jean-Jacques
Mattheijssens Maria
Matěj Radoslav
Merlin Céline
Miltenberger-Miltényi Gabriel
Müller vom Hagen Jennifer
Nacmias Benedetta
Ortega-Cubero Sara
Padovani Alessandro
Parobkova Eva
Pastor Pau
Peeters Karin
Perneczky Robert
Prokisch Holger
Ramirez Alfredo
Razquin Cristina
Robberecht Wim
Ruiz Agustín
Salmon Eric
Sanchez-Valle Raquel
Santana Isabel
Santens Patrick
Santiago Beatriz
Sarafov Stayko
Schöls Ludger
Sieben Anne
Sleegers Kristel
Smets Katrien
Sorbi Sandro
Strom Tim M
Ströbel Thomas
Synofzik Matthis
Testi Silvia
Thonberg Håkan
Tournev Ivailo
Van Broeckhoven Christine
Van Damme Philip
Van Den Broeck Marleen
van der Zee Julie
Van Langenhove Tim
Vandenberghe Rik
Vandenbulcke Mathieu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Mutations in the gene coding for Sequestosome 1 (SQSTM1) have been genetically associated with amyotrophic lateral sclerosis (ALS) and Paget disease of bone. In the present study, we analyzed the SQSTM1 coding sequence for mutations in an extended cohort of 1,808 patients with frontotemporal lobar degeneration (FTLD), ascertained within the European Early-Onset Dementia consortium. As control dataset, we sequenced 1,625 European control individuals and analyzed whole-exome sequence data of 2,274 German individuals (total n = 3,899). Association of rare SQSTM1 mutations was calculated in a meta-analysis of 4,332 FTLD and 10,240 control alleles. We identified 25 coding variants in FTLD patients of which 10 have not been described. Fifteen mutations were absent in the control individuals (carrier frequency < 0.00026) whilst the others were rare in both patients and control individuals. When pooling all variants with a minor allele frequency < 0.01, an overall frequency of 3.2 % was calculated in patients. Rare variant association analysis between patients and controls showed no difference over the whole protein, but suggested that rare mutations clustering in the UBA domain of SQSTM1 may influence disease susceptibility by doubling the risk for FTLD (RR = 2.18 [95 % CI 1.24-3.85]; corrected p value = 0.042). Detailed histopathology demonstrated that mutations in SQSTM1 associate with widespread neuronal and glial phospho-TDP-43 pathology. With this study, we provide further evidence for a putative role of rare mutations in SQSTM1 in the genetic etiology of FTLD and showed that, comparable to other FTLD/ALS genes, SQSTM1 mutations are associated with TDP-43 pathology

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Corrigendum to Dissemination in time and space in presymptomatic granulin mutation carriers: A spatial chronnectome study [Neurobiology of Aging Volume 108, December 2021, Pages 155-167

Author: Archetti Silvana
Benussi Alberto
Bocchetta Martina
Butler Chris R.
Calhoun Vince D.
Cash Dave
Convery Rhian
de Mendonça Alexandre
Finger Elizabeth
Galimberti Daniela
Gasparotti Roberto
Gazzina Stefano
Giunta Marcello
Graff Caroline
Iraji Armin
Jiskoot Lize
Laforce Robert
Masellis Mario
Moreno Fermin
Peakman Georgia
Premi Enrico
Rachakonda Srinivas
Rowe James B.
Sanchez-Valle Raquel
Synofzik Matthis
Tagliavini Fabrizio
Tartaglia Carmela
Todd Emily
van Swieten John C.
Vandenberghe Rik
Publication venue: Scholarship@Western
Publication date: 01/11/2022
Field of study

Scholarship@Western